Skip to content

Set BWA and bwamem2 index memory dynamically #6628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 11, 2025
Merged

Set BWA and bwamem2 index memory dynamically #6628

merged 3 commits into from
Mar 11, 2025

Conversation

edmundmiller
Copy link
Contributor

@edmundmiller edmundmiller commented Sep 11, 2024

Kept having bwamem2 index tasks that ran forever and failed.
Updated bwamem2 to use 28.B of memory per byte of fasta. Issue for reference: bwa-mem2/bwa-mem2#9

Also tracked down the required memory for bwa index while I was at it. Doesn't seem to fail because most of the genome requirements are under the necessary memory.

Not the first place where people have run into this #6628

@matthdsm
Copy link
Contributor

I like it 👍 it’ll also play nice with the new resourceLimits directive

@ewels
Copy link
Member

ewels commented Sep 11, 2024

I was talking to @drpatelh about this earlier this week. Sounds good. Very neat if it scales in such a linear way.

Should we add a baseline of additional memory?

@edmundmiller
Copy link
Contributor Author

Closes nf-core/sarek#1377

@maxulysse
Copy link
Member

What can we do for bwa mem and bwamem2 mem?

@edmundmiller
Copy link
Contributor Author

What can we do for bwa mem and bwamem2 mem?

What do you mean?

@muffato
Copy link
Member

muffato commented Oct 16, 2024

Is it right to have these settings hardcoded in the module ? How does it interact with pipeline-level config file doing

withName BWAMEM2_INDEX {
    memory { ... }
}

which one takes precedence ?

@matthdsm
Copy link
Contributor

AFAIK the pipeline config takes precedence over the config hardcoded in the module.
If you're worried about requesting too much resources, the resourceLimit directive should take care of that nicely

@muffato
Copy link
Member

muffato commented Oct 16, 2024

I was more worried that 28 GB / Gbp is still too high in my view. I use 24 GB / Gbp in my pipelines and wouldn't want nf-core to force me to waste RAM ;)
Also, your memory definition doesn't consider task.attempt. Are you absolutely certain that 28 GB / Gbp will work for every genome ? Usually, nf-core resource definitions always factor task.attempt.

I wasn't worried of check_max missing since nf-core is about to mandate a recent Nextflow that supports resourceLimit.

@muffato
Copy link
Member

muffato commented Oct 16, 2024

FYI, I've just checked our LSF logs and there's been zero memory failures over the 1,698 BWAMEM2_INDEX processes that we ran in 2024 with 24 GB/Gbp.
The memory efficiency is ~76% (median), and goes up to 95%, meaning that 23 GB/Gbp might still work for all genomes (it's just at the limit), but 22 GB/Gbp for sure would yield some memory errors.

Regardless of the scaling factor you use, I'd still keep task.attempt just in case (I'm overcautious !).

@edmundmiller
Copy link
Contributor Author

Regardless of the scaling factor you use, I'd still keep task.attempt just in case (I'm overcautious !).

I think that's a great point, definately we should definately add that in these.

Any opinions on the scaling factor? Should we really double it everytime or would 1.5x be generous enough?

FYI, I've just checked our LSF logs and there's been zero memory failures over the 1,698 BWAMEM2_INDEX processes that we ran in 2024 with 24 GB/Gbp.

Power to you! 😆 I vote we go with what was mentioned in the bwamem2 issue that they're expecting it to use. Unless you have a better link to point at when people start asking why their jobs keep failing.

@edmundmiller
Copy link
Contributor Author

We can revert this if it breaks stuff for some reason. Let's merge it and see if anyone has issues since it's gone stale except for @muffato's comments, which I addressed.

@edmundmiller edmundmiller merged commit b519b07 into master Mar 11, 2025
54 checks passed
@edmundmiller edmundmiller deleted the bwamemory branch March 11, 2025 18:45
@maxulysse
Copy link
Member

I've noticed a small issue with this when running on a small genome with docker:

docker: Error response from daemon: Minimum memory limit allowed is 6MB.

But nothing that can't be fixed with a label selector

@edmundmiller
Copy link
Contributor Author

We could also just fix it in the module. Make sure it's not equal to 0 or less than 6.

@maxulysse
Copy link
Member

We could take the max between the minimal requirement and the memory we compute there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants